Let be a measurable set and let a extended-value function be defined on , i.e. .
We say is a measurable function if its inverse image of any Borel set is measurable. Formally,
Theorem 1: The following statements are equivalent. And if holds any of the following property, we say is a measurable function.
For any , the set is measurable.
For any , the set is measurable.
For any , the set is measurable.
For any , the set is measurable.
Proof: Assume Item 1 holds. Let . So . Each is measurable, so Item 2 holds.
Assume Item 2 holds. The set in Item 3 is a complement of the set in Item 2. Hence Item 3 holds.
Assume Item 3 holds. By taking similar approach in proving Item 1 implies Item 2, we obtain the fact that Item 4 holds.
Assume Item 4 holds, the set in Item 1 is a complement of the set in Item 4. So Item 1 holds.
Corollary 2: If is a measurable function defined on the measurable set , then for any , the set is measurable.
Proof: If , note . The two sets on the right hand side are measurable, so is the one on left hand side.
If , consider . Then , and it is measurable. Similarly, the conclusion holds when .
Corollary 3: is a measurable function defined on the measurable set , if and only if, for any open set , is measurable.
Proof: For any , is an open set. So is measurable. Hence is a measurable function.
Coversely, suppose is measurable and consider an open set . It can be expressed as a union of disjoint open intervals, i.e. . Let and . Then and are both measurable sets, and is also measurable.
So is measurable.
Theorem 4 (Continuity implies measurability): Let be a continuous function on a measurable set . Then is measurable on .
Proof: If is continuous, for any open set we have where is open. Thus is measurable. By Corollary 3, is measurable.
Theorem 5: Let be a monotonic function on an interval . Then is measurable.
Proof: Assume is increasing in .
Suppose is an open interval where . For any , if then the set is measurable. If , the set is measurable. If , the set is measurable.
Suppose is a left-closed-right-open interval , where . For any , if then is measurable. If , is measurable. If , it is an empty set.
Similar statements can be applied to the cases and .
If is decreasing, simiarly we can show that it is also measurable.
Next, we wonder if is measurable, and functions have certain relationships with , how about their measurability?
Theorem 6: is a function on .
If is measurable and almost everywhere on , then is measurable on .
Let be a measurable subset of . Then is measurable on if and only if is measurable on both and .
Proof: For statement (1), let be the set that and on . Then for any , the set , which is measurable. This proves statement (1).
For statement (2), for any let and . If is measurable on then is measurable. Thus both and are measurable, which implies is measurable on both and . Conversely, if is measurable on and , the sets and are measurable. So is measurable.
We shall now only consider real-valued functions, i.e. , from a practical point of view and to simply a lot of proofs.
Theorem 7: are measurable functions on .
For any and , is measurable on .
is measurable on .
Proof: For statement (1), it suffices to prove (1a) is measurable, and (1b) is measurable.
For any , the set is measurable. This proves (1a).
Now consider the set . Then . Note the set of rationals is dense in . So there is a s.t. . Thus
Each component under the union is measurable and is countable. So the set on the left hand side of (1) is also measurable. This proves (1b).
For statement (2), we consider . Since we have establish statement (1), it remains to prove is measurable.
Consider the set . If , is measurable. , , which is also measurable.
Now we consider composition.
Theorem 8: Let is a measurable function on and is continuous on . Then the composition is measurable on .
Proof: Consider any open set . Then. . Since is continuous on , is open. Thus by Corollary 3, is measurable. Again by Corollary 3, is a measurable function.
Theorem 9: and are measurable on . Then
is measurable.
is measurable.
is measurable.
is measurable.
is measurable.
Proof: The set is measurable. This proves statement (1).
Similarly, the set is measurable. This proves statement (2).
The set if , and if . In either case, is measurable. This proves statement (3).
The set if and if . In either case, is measurable. This proves statement (4).
For statement (5), note . By (3), (4) and Theorem 8 (linearity preserves measurability), is measurable.
A CFG is said to be in Greibach Normal Form (GNF), if all its rules are of the following forms:
where, are nonterminals, and is some terminal and .
And if , then we can include
, and does not appear on the right hand side of any rule
Note in GNF, no left recursion can happen. This will be helpful for some top-down parsers.
To show that any CFG can be converted to GNF, we will construct an elegant mapping with the help of least fixed point property.
Least Fixed Point
Let be a partially order set. We say a sequence is a -chain if for . Every -chain has a least upper bound, denoted by
We say is complete poset iff it has a least element and every -chain has a least upper bound in .
Now let and be complete posets. A function is said to be -continuous iff
it is monotonic, i.e. if , then
, where
Theorem 1 (Least Fixed Point): Let be a complete poset. Then every -continous function has a least fixed point.
Proof: Let be the least element. We will use and .
The trick is to construct a -chain as follows:
To show it is -chain, note that is -continuous hence it is monotonic. Since is the least element, we have . Since is monotonic, we have . Continue this by induction on , we prove the chain is a -chain.
Since is complete poset, the chain has a least upper bound in , denoted by
is the least fixed point we are looking for. To see this, recall is -continuous. Thus , which shows is a fixed point. For any fixed point , since we have for any . Thus . This proves is the least fixed point.
Derivations at “Equilibrium”
For a CFG , let be its nonterminals. We can rearrange its production rules and group by nonterminals as:
…
, where and “” means “or”. For example, if we have two rules and , we can write . This notation simplicity later turns out to be an elegant way to represent CFG derivation by mimicing matrix multiplicaton and addition.
If we start derivating from some nonterminal, say , what would look like? and how its elements are added to the set gradually?
At step 0 where no nonterminal is rewritten, can only include some terminal strings if they don’t contain any nonterminals. The same goes to . So “after step 0”, the language dervied from contains some initial strings only. Let’s move 1 step further and look at again. Now the nonterminals in its start to “unfold”. For example, if in some , then it will unfold to all possible strings in in step 0. The same goes to all the other nonterminals.
So at each step, the language grows bigger and bigger. You may feel that there will some “limit” for how large can go, and will be some “equilibrium” so that when step , all “stop” at deriving new strings. At infty, the final will be some sort of “fixed point”.
The above idea is exactly what we will construct next and utilize the least fixed point theorem.
Formally let’s give more notation and definition.
Let be the set of symbols and be the set of terminals. We use to denote the set of nonterminals. Note a language is a finite subset of strings in . Similarly, we use letter to denote a finite subset of nonterminals and terminals in , i.e.
For any tuple of languages we define a function as follows:
, for all
, where is a nonterminal
, where and the dot operation means concatenating every pair of strings from the two set. Sometimes we omit the dot sign for simplicity.
, where and the plus operation means “or”, i.e. we union the 2 sets
You can easily verify that is a well defined function. Now we use it to define a mapping . Assume the rules in grammar are written in the form of (1) above, and let
The definition in (2) is a formal way to specify what we have discusseed. Given a tuple of languages we want to use it to expand all nonterminals in the rules and thus obtain a new set of languages .
So if we start with and apply , we will obtain , where each is the language derived from after 1 step. We keep doing this and eventually obtain . As you can guess, is the language derived from start symbol , i.e. , and they are exactly the least fixed point of .
We shall prove three lemmas, which will lead to our desired result in a theorem.
Note for any set , is a relation defined on as iff . So in our case, a paritially order relation is defined as iff , and iff for every we have .
Lemma 2: is a -continuous function.
Proof: Let and . Suppose , we want to show . Now consider their i-th components: and .
It suffices to show that for every , we have
Let , where and for . Note means this is a -rule. Then clearly (1) holds. If , then , for . If is a terminal, then . If is a nonterminal , then . By assumption, we have . Thus, we conclude that (1) holds and is monotonic.
Next, consider any -chain . Let . We need to show . This is to show for every , their i-th components are equal. Let be the i-th component of and be the i-th component of . Then we need to show and .
First, consider . There must be some such that . If we insepct closely, the set is obtained by rewriting each nonterminal in with the strings in their corresponding languages in , and is in this set. Since is increasing, there must be some such that . This shows .
Conversely, if , there is some and some such that . Then . Hence . This completes the proof.
By Theorem 1 (Least fixed point) and Lemma 2, has a least fixed point. It is constructed as
Note is the least element. Thus the least fixed point of is
Now we want to prove for all . This is establised by the following two lemmas.
Lemma 3: for all .
Proof: We prove by induction on . For , note if and only if is a production rule. Thus, can be derived from in one step and we clearly have .
Assume it holds for for every . Consider . Note . So for some . We expand , where is a terminal and is a nonterminal. By induction assumption, for each , if (with a bit abuse of notation, we let to denote
if ), then , i.e. derives in step. Thus can derive in steps, and can derive in steps. Hence .
Lemma 4:, for all .
Proof: We prove by induction on derivation length . If , then there must be a production rule . Hence . Now assume it holds for derivation length for all . Again, there must be some rule from which is reached. So in the derivation path there must exist a sub path for each (Note we can always turn a derivation path to an equivalent leftmost derivation) such that and eventaully . Note the length of such a sub path is less than . Thus by induction assumption, each for some . Now take . Then we have .
Finally with all discussions above, we arrive at
Theorem 5: is the least fixed point of .
Convert to Weak Greiback Normal Form
It is not difficult to see that if we are able to achieve , where is a terminal and can be any string, then we will be able to further convert it to the strong Greibach Normal Form mentioned at the begining of this note. The above form is called Weak Normal Form. As long as the first symbol is a terminal, we eliminate the possibility of left recursion. To achieve this, we need to introduce more “notation tricks” to represent production rules as matrix operations.
For a CFG , as above we let and assume contains nonterminals . We can group productions by their left-hand-side nonterminals as
Furthermore, on the right hand side, we can group by their first symbol, i.e.
If we write it in “matrix” form, (1) will become ,where is a row vector , is a row vector and each of its entries is a(possibly empty) finite subset of . Note if a is an empty set, then . Finally, is also a row vector where its entry is a string (possibly ) that starts with a terminal symbol.
So the set of production rules can be expressed as a matrix operation as: . We normally use to represent variables, so it becomes
Langauges over or form a semiring with
Addition : the union operations, i.e.
Multiplication :
Addition identiy : this is the emptyset
Multiplication identity : this is the
For a matrix , we let
We look at (3) again and in fact we can let . Theorem 5 says that is the least fixed point.
You can also find out that is actually derived by applying infinite number of times. Be careful that in (4), and can also contain nonterimnals . Thus when we apply a , we will write .
Now if we assume and do not contain any nonterminal, then the least fixed point of can be found by following the procedure in Theorem 5. First applying by once we have . Applying twice yield Applying it times yield . Thus, when we get the least fixed point as
Lemma 6: The fixed point of a gramma , whose production rules are expressed as where do not contain any nonterminal, has the least fixed point as .
Proof: It has been shown above.
Now the magic to turn any to a weak Greiback form is to create new nonterminals and rewrite the production rules from to . Note in above is a matrix.
We call this new grammar as . We let to denote the least fixed point of , and to denote the least fixed point of (where corresponds to the component, and to ). The elegance is that and hence and produces the same language. After this operation, rules in all start with terminal (we may assume no -rule or chain rule exists by first converting to CNF). To further make rules in to all start with terminal symbol, notice that only contain nonterminals in . So if is the fixed point, we simply let , and thus would meet our needs.
Now it remains to prove that .
Theorem 7: Let be expressed as
and be expressed as
If is the least point of and is the least fixed point of , then we have .
Proof: Since is the least fixed point of , we have . Because contains no nonterminals, by Lemma 6 the least fixed point of is . Therefore, (since by (1) is also a solution to is ).
Note the functions are both monotomic, by (2) we have
In the least fixed point theorem, for any with we have . So if we let , then (3) is showing with . Therefore, . Combining (2) and (4) we get
Similarly for , since is the least fixed point we have
Again using the trick of Lemma 6 and the fact that does not contain any nonterminal, we have the least fixed point of is . But (6) exactly states that is also a solution. So we have
Now notice . Again we have and and . Since is the least fixed point of , we have . Combining (7) and (8) we have
To proceed, observe that is a solution to . In fact, is true because by (5) we have . Also is an identity. Since is the least fixed point of , we must have .
On the other hand, using first half of (6) and result of (9) we get . We can see that is a solution to by (10). In fact,
. Since is the least fixed point of , we must have . This completes the proof.
A CFG is said to be in Chomsky Norma Form, if all its productions are only of the following forms:
, where are non-terminals
, where is a terminal and
if , and there never occurs on the right hand side of any production rule
The above is saying you either produce 2 non-terminals, or produce 1 terminal. And there will be no “-rules” and no “chain rules”. The only possible -production is directly from the start symbol.
We can build an algorithm to convert any CFG to an equivalent in CNF form. The idea is to do it at 3 steps: (1) eliminate -rules; (2) eliminate chain rules; (3) restrict all right hand sides to either 2 non-terminals or 1 terminal
Eliminate -rules
For a non terminal , if there is some chain that evetually leads to then we call “erasible”. If we eliminate all rules, then we need to effectively elinate all “erasible chains” as well.
Formally, for a CFG we use to denote the set of erasible non terminals. We construct inductively as follows:
We can see . We then let
Since only has finitely many non terminals, there must be a max such that . We can formally prove contains all easible non terminals. The following theorem gives a procedure to formally remove all -rules while letting the new grammar generate the same language.
Theorem 1 (Eliminate -rules): For any CFG , there is an equivalent such that
There are no rules of the form ,
except that if the only possible such rule is and does not appear on the right hand side of any rule
Proof: If , we create a new non-terminal and add to the rules. This make sure that will never occur on the right hand side of any rule.
We proceed to eliminate all -rules. So in the new rule set, we have as its subset as .
Now we need to add some new rules to maintain “erasibility” of original symbols. This is done by constructing a set of new rules as
So the new rule set would be if or otherwise. This completes the proof.
Note in above, some can contains erasible non terminals. You should convince yourself that the construction of indeed includes all possible new rules.
Eliminate Chain Rules
Given any CFG , with Theorem 1 we obtain an equivalent with no -rules. The next step is to eliminate chain rules.
Here we construct inductive, for a non-terminal , its set of chainable non-terminals as
So . There is a max such that .
Theorem 2 (Eliminate Chain Rules): For any CFG with no -rules, there is an equivalent such that: except the possible , all rules satisfie
If contains some non terminals, then
Proof: This is to say we first eliminate all unit rules where right hand side is a non terminal . Then we need to add back some rules to maintain the equivalence.
For a non terminal , consider its . For each , if there are some where either or , then we add a new rule .
Final Transformation
After eliminating -rules and chain rules, all the rules are of
if then and does not appear on the right hand side of any rule
, where , or or .
Theorem 3: For any CFG , there is an equivalent in Chomsky Normal Form.
Proof: We assume has no -rules and no chain rules. First we deal with terminal productions. For a rule , if is a terminal we retain it. If are all terminals but has length , i.e. , then we do
create new non terminals $P_1,…,P_{n-1}, P_n, P’1,…,P’{n-1}$,
and create following new rules:
…
$P’{n-1} \rightarrow a{n-1}$
Now if in we have contains some non terminals, then we do
assume , where are non terminals and are all terminal strings
create new non terminals and add rules . For each such rule, repeat the above process to transform into CNF form. (if is empty, then ignore it)
consider rule . We create new non terminals and new rules:
…
CYK Parser
For a CFG in Chomsky Normal Form, a CYK parser can both regconize and parse any string . Its worst case takes where . In practice, almost no is designed in CNF. And even we can transform it to CNF, a CYK parser can only parse it to CNF tree but not its original parse tree. Thus, CNF and CYK parser are not popular in practice. We will not proceed to note any details here.
is the set of terminal symbols. It is also the alphabet
is the set of production rules. It is a relation on . A production rule is something like , where can be non-terminal or terminal symbol
is the start symbol
In simple words, a CFG always starts with and at each step rewrites a non-terminal symbol to some string according to some production rule, until all symbols are terminals.
We say a string (or sentence) is accepted by a CFG iff , i.e. derives . The language accepted by is denoted by .
We can define: derives ”, iff there is a finite chain of strings such that , and for .
In above, we still to define what we mean by . In general, we say to mean derives in one step and satifies that
, where (note can be empty ), and is a non-terminal
, where and is a production rule
We say a derivation is leftmost if for every , must be of the forms:
, where and there is no non-terminal in
, where is a production rule
We can define rightmost derivation likewise
Now the first non-trivial result is we can transform every derivation to a leftmost (or rightmost) derivation.
Leftmost (rightmost) derivations
Theorem 1: Suppose a CFG . If is accepted by , then there is a leftmost derivation such that .
Proof: Since is accepted by , there must be a derivation . Here the means the length of derivation is . We prove by induction on .
Note we can’t prove by induction on that has a leftmost derivation. But rather we claim: for any string , the derivation has a leftmost derivation and it has the same length.
The case of is obvious. Suppose it holds for . Now consider and original derivation is . We can write .
If is leftmost, by induction assumption there must be an equivalent leftmost derivation of the same length to replace , so we are done.
If is not leftmost, then we consider . Note is already leftmost (by induction assumption). So must be of the following form:
, where are all terminals, and , and are non-terminals
, where is a production rule
, where is a production rule
Now we swap the order of , and is still a valid derivation. Clearly is leftmost now. For , by induction assumption, we can find an equivalent leftmost derivation. So the whole new derivation is leftmost. This completes the proof.
The idea to measure a set in is to create a function with three important properties
Any interval is measurable
Additivity: if are mutually disjoint, then
Translation Invariant: , where means the set
It turns out that such a measure is not possible. At best we can define a measure to cover as many sets as possible. This measure is Lebesgue measure.
We start with constructing a weaker measure named Lebesgue outer measure. This measure covers all sets in but does not possess Additivity property. But it does have sub-additivity, i.e. if then .
For an open interval , we use to denote its length.
Definition 1 (Lebesgue Outer Measure): For any set in , its Lebesgue outer measure is defined as
That is, for any open cover that covers , we take the “minimum” one and sum their lengths.
You can easily prove that for any open , closed , half-open-half-closed interval , its outer measure
Lebesgue outer measure has properties of translaton invariance and sub-additivity.
Theorem 1 (Translation Invariant): For any set and any real number , we have
Proof: Note that this is true if is an interval. We must have because any cover that covers we have that must cover . Since , we have is a subset of . The reverse is also true.
Theorem 2 (Countable sub additivity): If , then
Proof: For , there exists an open cover of with . Note covers and also . So . Since this holds for any , we must have .
Lebesgue Measure
Lebesgue measurable sets for a -algebra
We say a set of subsets in is a -algebra is it contains and , and closed under both complementation and countable unions. The smallest -algebra that contains all open sets is called Borel -algebra.
Lebesgue outer measure is defined on all numbers in . It does not have countable additivity property. We will see that if we restrict the measure function to a -algebra on then we have the desired property. After this restriction, these sets become “Lebesgue measurable”.
Formally, a set is said to be Lebesgue measurable if for any set , we have
By sub additivity, it is always true that . So to prove a set is measurable, we only need to show for any set
Now we will see most “normal” sets are measurable, and these sets form a -algebra.
If a set is measurable, we would simply use to denote its measure.
Theorem 3: If a set is measurable, so is its complement .
Proof: Since , for any set we have .
Theorem 4: Any set with outer measure is measurable.
Proof: Note for any set , we have since . To show , we only need to show .
Notice . By sub additivity we have . Since we must have both and .
Theorem 5: and are both measurable.
Proof: Since , is measurable. By Theorem 3, is also measurable.
Theorem 6 (Countable Additivity): If are disjoint and measurable, then is also measurable and . This is also true when is infinite.
Proof: We first prove it for finite cases. We do it by induction on . Consider where are disjoin and measurable. For any set , we let , and . To show is measurable, we need to show
Since is measurable, we have . Since is measurable, we have
. So we have . By (1) and (2), it remains to show , which is true by outer measure’s sub additivity. Hence is measurable.
Again since is measurable, we have . This proves the theorem holds for . It is easy to extend it to finitely sets by considering .
To show is measurable, consider . For a set , suppose is finite and , where . Then we have
So there must exist some such that
Since is measurable, we have . So (3) leads to . But . So (4) is a contradiction. This proves that is measurable.
Finally, to show , note that for any we must have
Thus we must have . Combining the fact that , we complete the proof.
Theorem 7 (closed under countable unions): If are measurable, is also measurable.
Proof: This is easily seen from Theorem 6. Simply construct and . Then are disjoint. By Theorem 3 and 6, they are all measurable. Again by Theorem 6, is also measurable.
Theorem 8: Any interval of the form is measurable.
Proof: Consider a set . We may assume . Otherwise we can let .
Consider an open cover of . For each let and . So . covers and covers .
Thus,
(1) holds for any open cover . Hence we have .
With Theorem 7 and the fact that measurable sets form a -algebra, we have
Theorem 8: Any Borel set is measurable.
Proof: This is easily shown by combining Theorem 3, 6, 7.
Approximation
One of Littlewood’s principles states: a Lebesgue measurable set is almost open.
We shall construct several approximations of a measurable set by open and closed sets.
Lemma 9 (Excision Property): If is measurable with finite measure, then for any containing we have
Proof: Note and since . Since is measurable, . Finally, since is finite, we obtain the result as desired.
Lemma 10: If is measurable with , then where are disjoint, measurable, of finite measures, and
Proof: Take and , . Then each or satisfies all the properties. By countable additivity, we obtain what we desire.
Theorem 11: (Outer and Inner Approximation): Let be a set of real numbers. Then each of the following statement is equivalent to the measurability of .
Outer approximation
For any , there is an open set containing for which
There is a set, i.e. a countable intersection of open sets, containing for which
Inner approximation
For any , there is a closed set contained in for which
There is a set, i.e. a countable union of closed sets, contained in for which .
Proof: Note (1) is equivalent to (3) and (2) is equivalent to (4). To see this, suppose (1) holds. For any there is some for which . Take . Then and . So . Similarly, if (2) holds then (4) also holds. Conversely, we can also easily show (3) implies (1) and (4) implies (2).
So it suffices to establish that measurability of implies (1) and (1) and (2) and (2) implies measurability of .
Suppose is measurable. First we assume . For any there is an open cover such that . Let . Then . By Excision property, we have .
Now suppose . By Lemma 10, where are disjoint and measurable with finite measures. Then (1) holds for each with some and . So by taking , we have .
To see (1) implies (2), consider . Then by (1) there is an for which and . Take . We have and for any . Thus, , for any . Hence .
Finally, to see (2) implies (1), note is measurable since its outer measure is 0. Also is measurable. So is measurable. .
Now we can show one of Littlewood’s principle.
Theorem 12: If is measurable with finite measure, for any there is an open set such that .
Proof: By Theorem 11, there is an open set s.t. . Since is of finite measure, so is (by Excision property). can be expressed as the union of a collection of disjoint open intervals, i.e. . Since , there is some s.t. when we have .
Now take . Then and thus . On the other hand, since we have . Thus .
Continuity
The idea is that if , then . Formally, we have the following theorems.
Theorem 13: is an ascending sequence of measurable sets, i.e. for every . Then
Proof: If for some s.t. when all , the theorem will hold. So we may assume all .
Let and for . Then . Thus,
The summantion of right hand side of (1) is equal to . Hence we have the desired result.
Theorem 14: is a descending sequence of measurable sets and , i.e. for every . Then,
Proof: Let . Since is descending, is ascending. So , and by Theorem 11
By Excision property, from (2) we have
Since , we obtain the result as desired.
For a measurable set , we say a proper holds almost everywhere (a.e. for short) in if there is a measurable subset s.t. and the property holds for all .
By Theorem 12, we can prove Borel-Cantelli Lemma.
Borel-Cantelli Lemma: is a collection of measurable sets and . Then no almost every appears only in finitely many .
Proof: In other words, almost no will appear infinitely often in . This is equivalent to show